The Impact of Automatic Morphological Analysis & Disambiguation on Dependency Parsing of Turkish
نویسنده
چکیده
The studies on dependency parsing of Turkish so far gave their results on the Turkish Dependency Treebank. This treebank consists of gold standard sentences where part-of-speech tags are manually assigned to each word and the words forming multi word expressions are also manually determined and combined into single units. For the first time, we investigate the results of parsing Turkish sentences from scratch and observe the accuracy drop at the end of processing raw data. We test one state-of-the art morphological analyzer together with two different morphological disambiguators. We both show separately the accuracy drop due to the automatic morphological processing and to the lack of multi word unit extraction. With this purpose, we use and present a new version of the Turkish Treebank where we detached the multi word expressions (MWEs) into multiple tokens and manually annotated the missing part-of-speech tags of these new tokens.
منابع مشابه
Turkish Treebank as a Gold Standard for Morphological Disambiguation and Its Influence on Parsing
So far predicted scenarios for Turkish dependency parsing have used a morphological disambiguator that is trained on the data distributed with the tool(Sak et al., 2008). Although models trained on this data have high accuracy scores on the test and development data of the same set, the accuracy drastically drops when the model is used in the preprocessing of Turkish Treebank parsing experiment...
متن کاملTesting the Effect of Morphological Disambiguation in Dependency Parsing of Basque
This paper presents a set of experiments performed on parsing Basque, a morphologically rich and agglutinative language, studying the effect of using the morphological analyzer for Basque together with the morphological disambiguation module, in contrast to using the gold standard tags taken from the treebank. The objective is to obtain a first estimate of the effect of errors in morphological ...
متن کاملMorphological and Syntactic Case in Statistical Dependency Parsing
Most morphologically rich languages with free word order use case systems to mark the grammatical function of nominal elements, especially for the core argument functions of a verb. The standard pipeline approach in syntactic dependency parsing assumes a complete disambiguation of morphological (case) information prior to automatic syntactic analysis. Parsing experiments on Czech, German, and H...
متن کاملParsing Morphologically Rich Languages with (Mostly) Off-The-Shelf Software and Word Vectors
As a contribution to the 2014 SPMRL shared task on parsing morphologically rich languages, we show that it is now possible to achieve high dependency accuracy using existing parsers without the need for intricate multi-parser schemes even if only small amounts of training data are available. We further show that the impact of using word vectors on parsing quality heavily depends on the amount o...
متن کاملStatistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing
This paper gives two contributions to dependency parsing in Korean. First, we build a Korean dependency Treebank from an existing constituent Treebank. For a morphologically rich language like Korean, dependency parsing shows some advantages over constituent parsing. Since there is not much training data available, we automatically generate dependency trees by applying head-percolation rules an...
متن کامل